Automated extraction of chemical structure information from digital raster images
نویسندگان
چکیده
BACKGROUND To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. RESULTS This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader - a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. CONCLUSION The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.
منابع مشابه
Digital surface model extraction with high details using single high resolution satellite image and SRTM global DEM based on deep learning
The digital surface model (DSM) is an important product in the field of photogrammetry and remote sensing and has variety of applications in this field. Existed techniques require more than one image for DSM extraction and in this paper it is tried to investigate and analyze the probability of DSM extraction from a single satellite image. In this regard, an algorithm based on deep convolutional...
متن کاملNovel Automated Method for Minirhizotron Image Analysis: Root Detection using Curvelet Transform
In this article a new method is introduced for distinguishing roots and background based on their digital curvelet transform in minirhizotron images. In the proposed method, the nonlinear mapping is applied on sub-band curvelet components followed by boundary detection using energy optimization concept. The curvelet transform has the excellent capability in detecting roots with different orient...
متن کاملAutomated Vectorization and Labeling of Very Large Raster Hypsographic Map Images Using Contour Graph
This paper presents a very efficient method for vectorizing and labeling very large raster hypsographic map images. By extending the contour tree (Freeman and Morse, 1967) to a contour graph, open contour lines and carrying contour lines can be handled. The ambiguities of the topological relationships among contour lines are resolved using map structure information, i.e., elevation tags and ind...
متن کاملARUBA and TOBAGO { new approaches for the generation of Cybercity from images
Automated methods for reliable 3-D reconstruction of man-made objects are essential to many users and providers of 3-D city data, including urban planners, architects, telecommunication and environmental engineers. Manual 3-D processing of aerial images is time consuming and requires the expertise of qualiied personnel. Therefore, the necessity to interpret, classify and quantitatively process ...
متن کاملAn Algorithm for Centreline Extraction Using Natural Neighbour Interpolation
Data caption and conversion are two of the most costly operations of any GIS, in terms of computer time and manual work needed for spatial data acquisition. They can represent up to 80 percent of the total implementation costs. Manual digitising is a very error prone and costly operation, especially due to the lack of explicit topology in commercial GIS systems. Indeed, each map update might re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Chemistry Central Journal
دوره 3 شماره
صفحات -
تاریخ انتشار 2009